Getting Your Work to Work
in Academia (and Beyond!)

RSS Conference 2023

Zak Varty

Talk Structure

  1. Replicable vs Reproducible work
  2. Reproducible Research
  3. Reproducible Teaching
  4. Reproducible Service

A combination of theft, anecdotes and commentary.

Some awesome people this talk is riffing on

Replication vs Reproducibility

Replication

Replicable: if the experiment were repeated by an independent investigator, you would get slightly different data but would the substative conclusions be the same?

  • In the specific sense, this is the core worry for a statistician!

  • Also used more generally: are results stable to perturbations in population / study design / modelling / analysis?

  • Only real test is to try it. Control risk with shadow and parallel deployment.

Reproducibility

Reproducible: given the original raw data and code, can you get all of the results again?

  • Reproducible != Correct

  • “Code available on request” is the new “Data available on request”

  • Reproducible data analysis requires effort, time and skill.

Reproducibility is not Binary


A study is reproducible if you can take the original data and the computer code used to analyze the data and recreate all of the numerical findings from the study.

Broman et al. (2017) “Recommendations to Funding Agencies for Supporting Reproducible Research”

What can we do?

  • Replication crisis requires system level solutions and cultural change.
    • Publishing null results
    • Documenting our forking paths
    • Fund replications and novelty
  • Reproducibility we can work on at individual level.
    • All the standard coding stuff: seeds, projects, portable file paths
    • Who, what, when, where, why and how?

Reproducibility is good for science and good for the individual.

Research

Sharing Code and Data

Publicly sharing code as well as data: importance of documentation & testing.

Literate Programming for Papers

Writing Software

  • Why: People have finite energy. Make it easy to recreate, use and extend your work.
  • Who: Specialists in other areas, statisticians, future you.
  • Examples: spatial statistics, changepoints and HMMs.

Teaching

Teaching Materials

Teaching materials made with literate programming or WYWIWYG:

  • Text-based so plays nicely with version control
  • Multiple outputs from same source (html, pdf, slides)
  • DRY principle for assessments
  • Leading by example

Individualised Assessments

Steal from reproducible reporting to give each student their own dataset to analyse and produce individualised mark schemes.

Teaching reproducibility

  • Training current / future colleges.

  • Good programmes cover reproducible modelling

    • Need to consider reproducible workflow
    • Data acquisition, cleaning, documentation, naming, reporting, automation, scaling.
  • Notebooks controlled environment but more need for scripting and literate reporting.

    • HARD because of the mix of languages / OS / backgrounds.



Service

Reproducibilty in the broder sense

Can also take a broader view of reproducibility: monotonous jobs you have to do repeatedly that take a long time to do.


From my experience:

  • Scripting: File organisation and LMS.
  • Individal: Individual feedback forms from rubric spreadsheet. (!)

Wrapping up

  1. Replication requires structural and cultural change, but reproducibility starts with you.
  2. Reproducibility is not just good science, it’s in your own self interest.
  3. Teaching reproducibility reproducibily is essential (and fun), but challenging.
  4. Get inventive with automation and reproducibility.

Citations

Broman, Karl, Mine Cetinkaya-Rundel, Amy Nussbaum, Christopher Paciorek, Roger Peng, Daniel Turek, and Hadley Wickham. 2017. “Recommendations to Funding Agencies for Supporting Reproducible Research.” In American Statistical Association, 2:1–4.